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Abstract 

When analyzing interaction networks, it is common to interpret the amount of interaction between two nodes as 
the strength of their relationship. We argue that this interpretation may not be appropriate, since the interaction 
between a pair of nodes could potentially be explained only by characteristics of the nodes that compose the pair 
and, however, not by pair-specific features. In interaction networks, where edges or arcs are count-valued, the above 
scenario corresponds to a model of independence for the expected interaction in the network, and consequently we 
propose the notions of arc strength, and edge strength to be understood as departures from this model of indepen- 
dence. We discuss how our notion of arc/edge strength can be used as a guidance to study network structure, and 
in particular we develop a stochastic blockmodcl for directed interaction networks where arc strength is taken as a 
latent variable. We illustrate our approach by studying the interaction between the Kolkata users of the myGamma 
mobile social network. 

Key words and phrases: Attractiveness, Bootstrap, EM algorithm, Gregariousness, Social network, Social struc- 
ture, Stochastic blockmodel, Granovetter's tie strength, Valued graph. 
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In many scenarios it is possible to count the amount of interaction between individuals or, more generally, between 
nodes of a network. This interaction can be either directed or undirected. Examples of directed interaction networks 
include communication networks, where we can count the number of text messages, calls, or e-mails sent from 



individual to individual (e.g., Diesner and Carley. 2005 Tyler et al. 2005); and citation networks, where we can 



record the number of times certain blog links to another blog (e.g., Adamic and Glance 2005), or the number of 



times one author cites another (e.g., Ding 2011). Undirected interaction networks include collaboration networks, 



where we can study the number of papers coauthored by two scholars (e.g., Newman 2001) or the number of bills 



cosponsored by legislators (Fowler 2006); patient-sharing networks, where we record the number of patients shared 



by physicians (Barnctt ct al. 2012); and tree interaction networks, where we can record, for instance, the number 



of common fungal species two tree species can host (Mariadassou et al. 20101 
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When studying interaction networks, it seems natural to interpret the amount of interaction between two nodes 
as the strength of the arc, or of the edge, depending on the relation being directed, or undirected, respectively. This 



interpretation is rather common in practice (see, e.g., Mariadassou et al. 2010 Barnett et al. 2012 ), and it appears in 



textbooks on social network analysis (see, e.g., Wasserman and Faust 1994). We argue, however, that the interaction 



in a network could potentially be explained under a model of independence, where the expected interaction between 
nodes is modeled using only nodal characteristics, in which case the interpretation of the amount of interaction 
as the strength of the arc/edge would not be appropriate. In this article we present a model-based approach to 
the concepts of arc strength, and edge strength in directed, and undirected interaction networks, respectively. We 
propose to model the interaction between nodes in a way such that departures from a model of independence can be 
interpreted as arc/edge strength. The intuition for our approach is that, after controlling the nodal characteristics 
that account for the interaction in the network (such as gregariousness and attractiveness in directed networks), a 
larger arc/edge strength should lead to more interaction between nodes. This approach follows the long-standing 
tradition of establishing a null model for the network, and then interpreting departures from this null model as 



network structure. This tradition goes back at least to the work of Blau (1977), Rapoport (1980), and Strauss 



and Freeman (1989), and it has been used more recently by Heckathorn and Jeffri (2001), Zheng et al. (2006), and 



and Freeman 


(1989) 


DiPrete et al. 


(2011 



1.1 Arc/Edge Strength vs. Granovetter's Tie Strength 

The closely related concept of tie strength has received a lot of attention in the social sciences literature. The first 



definition of the strength of the tie between two individuals was given by Granovetter (1973): "the strength of a tie 
is a (probably linear) combination of the amount of time, the emotional intensity, the intimacy (mutual confiding), 
and the reciprocal services which characterize the tie." Although this definition has been operationalized via factor 



analysis (Marsden and Campbell 1984 Mathews et al. 1998), it is not appropriate for the class of networks that 



we study in this article. First of all, our approach aims to study arc/edge strength using the observed interaction 
between nodes, whereas Granovetter's definition typically requires data collected from cross-sectional surveys that 
aim to measure Granovetter's components of tie strength. Despite the fact that Granovetter's definition captures 
different characteristics that lead to consider a tie as strong, it is not applicable to networks where the nodes are not 
individuals. For instance, in blog citation networks, patient-sharing networks, or in tree interaction networks, as 
mentioned above, it is not clear what "emotional intensity" or "intimacy" would mean. Furthermore, Granovetter's 
definition does not control for nodal characteristics that may lead to more interaction (or more time spent) between 
nodes, such as node's gregariousness or attractiveness. Finally, Granovetter's definition implies that the tie from 
node i to j is as strong as the tie from j to i. We believe, however, that it is more natural to consider asymmetric 
definitions of strength, i.e., we should allow the strength from i to j to be different than from j to i, in particular 
when the interaction between nodes is directed. Thus, in this article we use the terms arc strength, and edge strength 
to avoid confusion with Granovetter's approach. 



1.2 Online Social Networks 

Online social networks, such as Facebook, Google+, or Linkedln, offer services focused on facilitating the interaction 
between users. We explore the ideas presented in this article using data from the myGamma mobile network. 
myGamma (http://in.mygcmima.com/) is a mobile social networking service provided by BuzzCity, a Singapore 
based company. |BuzzCity ( 2007 ) characterized the users of myGamma as people who access the Internet primarily 
via mobile phones, living in emerging markets or working in the blue collar sector in wealthier nations. Users 
declare friends and foes as directed links, and they interact via chats, messages, blogs, groups, games, etc. For the 
purposes of this article, we take users located in the city of Kolkata, India, and we focus on users who were using the 
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networking service prior to May 2010 and who were active during June 2010. In order to create a count interaction 
network, we take the number of chat messages sent during June 2010. In Figure [I] we present the scatterplot of the 
interaction between genders. This plot contains the 786 mixed-gender dyads with declared friendships (we exclude 
pairs with declared foe links, and one extremely outlying pair with more than 1,000 chat messages exchanged). The 
horizontal axis represents the chat messages going from females to males, and the vertical axis the chat messages 
going from males to females. The histograms located at each side represent the observed marginal distributions of 
interaction, and the diagonal line dividing the scatterplot has a slope of one and zero intercept. Figure [l] shows 
that only a few pairs of users with declared friendships have large amounts of interaction, whereas most of the 
interaction among them is null or pretty small. This distributional characteristic seems to be ubiquitous in online 



social networks (see, e.g., Gilbert and Karahalios 2009 Xiang et al. 2010). Furthermore, the asymmetry between 



males and females in terms of their amounts of interaction is apparent in Figure [TJ since males tend to send more 
chat messages to females than females to males. In Section [3] we use our approach to arc strength to construct a 
model for interaction counts among dyads, and we use it to study the distribution of arc strength among the pairs 
of Kolkata users with declared friendships in the myGamma network. 

1.3 Overview of the Article 

In Section [2] we describe in detail our ideal approach to edge and arc strength. This description contains in Section 
|2.1| the proposal of a null model that can be interpreted as a scenario where no differential values of arc strength are 
present. In Section [2.2| we show which are our ideal parameters of arc strength, but we argue that such approach is 



not feasible, and so in Section 2.3 we present a discussion on possible alternatives that conserve the nature of our 
ideal approach. In Section [2. 4| we briefly present our approach to edge strength for undirected interaction networks, 
although in the remainder of the article we focus on directed interaction networks. We introduce in Section [3] the 
latent arc strength stochastic blockmodels (LASSB) as a sensible alternative to our ideal approach to arc strength. In 
Section [4] we use the LASSB to study the distribution of arc strength in the myGamma Kolkata network presented 
in Section |1.2[ and in Section |4.1| we present a simulation study to assess the goodness of fit of our model. Section 
[5] contains some conclusions, and a discussion on issues of our approach. Finally, in Appendix [A] we present an EM 
algorithm to fit the LASSB via maximum likelihood. 



2 The Strength of Arcs and Edges 

We focus on a network formed by a set of n nodes (a.k.a. vertices) labeled {l,...,n}, and pairs of counts 
{(Xij, Xji), i < j} associated to the set of dyads (pairs of nodes) {{i,j},i < j}- The count Xij could be ob- 
tained, for instance, as the amount of interaction going from node i to node j during certain period of time, and 
so we call it interaction count. In undirected interaction networks we call (i, j) the edge associated to the dyad 
< j, and Xij — Xji is the value of the edge. For directed interaction networks, X^ may be different from 
Xji, and each dyad {i,j},i < j, has two associated arcs: and (J,i), which take the values Xij, and Xji, 

respectively. 

In Sections |2 . 1 1 and |2 . 2| we present our guideline approach to arc strength for directed interaction networks, and 
in Section [2. 4| we briefly adapt these ideas to edge strength in undirected networks. 

2.1 A Null Model for Directed Interaction Networks 

In order to construct a model-based approach to arc strength, we need to think of a null model under which we 
can say that all arcs have the same strength. Assume the interaction counts {(X^, Xji), i < j} follow a distribution 
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Chat Messages from Females to Males 
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Figure 1: Number of chat messages exchanged between 786 mixed-gender pairs of myGamma users 
with declared friendships. These users were located in Kolkata, India, and the chat messages were 
sent during June 2010. In the scatterplot, the horizontal axis represents the interaction going from 
females to males, the vertical axis the interaction from males to females, and the diagonal line 
represents equality between the number of messages exchanged by the pair of users. The bars of the 
histogram on the top represent the frequencies (on the squared root scale) of female-to-male arcs 
with the number of messages presented in the horizontal axis of the scatterplot. The histogram on 
the right has a similar construction for male-to-female arcs. 

G, such that E(Xij) = 9otij3j, for all i ^ j, where 9, ctj, and /3j are positive numbers. This represents a model of 
independence for the expected interaction counts, i.e., the expected amounts of interaction are explained only by 
nodal characteristics. In this context we call the parameters on the gregariousness of node i, j3j the attractiveness of 
node j, and 9 the density of interaction, although their specific interpretation is subject to constraints imposed to 
ensure identifiability. We believe that a model of independence for the mean interaction is a fundamental component 
of a scenario that can be interpreted as all arcs having the same strength. 

Our null model, however, still needs to completely specify the distribution G. In order to select this distri- 
bution, let us think of a process where we can say that there are no differential values of arc strength involved. 
Suppose we observe the interaction from node i to node j in the time interval [0, 1], so that at time there is no 
observed interaction. Assume the amounts of interaction for two non-overlapping time intervals are independent. 
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Furthermore, assume in any time interval there is a non-zero probability of interaction, and that different single 
interaction events cannot happen at the same time. Finally, suppose the interaction counts for two equal sized time 
intervals are identically distributed. The reader may realize that we just described a Poisson process (see, e.g., 



Parzen 1962). Assuming that we have one independent process for each arc, we obtain a scenario where the nodes 
interact independently with each other over time. The interaction count Xy at the end of the period follows a 
Poisson distribution with certain mean my. If these means can be expressed as = OctiPj for all i ^ j, then we 
say that there are no differential values of arc strength governing the interaction in the network. 

The Poisson model with independence for the mean has been taken as a null model in different studies, where 



departures from it are interpreted as social structure (Zheng et al. 2006 DiPrete et al. 2011 1. This is also one of the 



simplest models that we can think of when modeling count data, although we can expect to find other more appealing 
null models of independence for interaction networks. We expect that the ideas presented here can be adapted easily 
to those scenarios, but for the remainder of this article we focus on the Poisson model with independence for the 
mean as the scenario having no differential values of arc strength. 

2.2 Arc Strength as a Departure from Independence 

From the previous section we have that, if the interaction data {(Xij, Xji), i < j} are generated independently as 

Xij ™ d Poisson(0aj/y) for i ^ j, (1) 

then we say that all arcs have the same strength in the network. Notice that although we refer to model ([!]) as 
a model of independence, it actually corresponds to a quasi-independence model (see Bishop et al. 1975[ ), since 



the counts Xu, i = 1, . . . , n, are not defined. Model (JlJ is not identifiable unless we impose a set of constraints 
on the sets of parameters {cy, . . . , a n }, and {Pi, ■ ■ ■ , /?«}• We could fix, for instance, a,\ = Pi = 1, or require 
Ili a i — Yij Pj — 1j although the set of constraints is arbitrary, and the interpretation of the parameters changes 
accordingly. Notice that the model in equation (JlJ has 2n — 1 free parameters after constraints have been imposed, 
and the amount of valued arcs is n(n — 1), which leads to n 2 — 3n + 1 degrees of freedom that, in principle, could 
be used to capture departures from this model. Let Ay measure the multiplicative departure of my from the mean 
in the independence model of equation (JlJ, i.e., Ay is a parameter included specifically for the arc and it can 

be written as Ay = my /OctiPj. Thus, we say that under the model 

X^ ™ d Poisson^cy/yAy) for i ^ j, (2) 

Ay can be interpreted as the strength of the arc One parameter Ay for each arc represents our ideal measure 

of arc strength. This approach, however, is unfeasible. In order for the n(n — 1) parameters Ay to be included 
jointly with {ay /3y i — 1, . . . , n}, we require constraints like Xu = Xij = 1, or Y\ i Xij = FT. Ay = 1, for all i and 
j, which leads to (n — l)(n — 2) free parameters Ay. Consequently, the number of free parameters of model ([2| is 
1 + 2(n — 1) + (n — l)(n — 2) = n(n — 1) + 1, which exceeds the n(n — 1) interaction counts available to us. We 
thus conclude that model ^ is not suitable for statistical inference in this context. Model ([2]), however, represents 
a guideline for how arc strength should be conceptualized, this is, as a departure from a model of independence. 



2.3 Modeling Alternatives 

The above mentioned difficulties lead us to consider more parsimonious models that conserve the essence of our 
intuition for arc strength. The spectrum of alternatives start with modeling parameters as functions of covariates 
(if available): and /y as functions of nodal covariates, and Ay as a function of arc covariates. On the other 
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side of the spectrum we have models that take parameters as latent variables: and /3j's distributions depend on 
nodal covariates, and Xij's distribution depends on arc covariates. The selection of the appropriate model depends 
on the research purpose, and on the data at hand. For instance, if the researcher's focus is on exploring the 
distribution of arc strength, gregariousness, or attractiveness in the network, then modeling these characteristics as 



latent variables would be the natural way to proceed. The approach presented in Xiang et al. (2010) to construct 
a notion of relationship strength uses a combination of these alternatives, although their input data is a number of 
dichotomized interaction variables which measure whether specific kinds of interaction are null or not. 

An additional motivation for moving towards simplified models is that we may be interested in studying arc 
strength for only a subset of dyads. As an example, in online social networks we may want to study arc strength for 
the subset of dyads with declared binary links (e.g., "friendships"), in which case the estimation of arc strengths as 
fixed effects would be even more cumbersome, since the number of interaction counts would be smaller than n{n— 1). 
Furthermore, in this scenario we may not even be able to fit model |l]). If for instance nodes i and j had declared 
binary links only among themselves, then only the two interaction counts Xij and Xji would be available from this 
pair, but model ([!]) would still require the estimation of on, otj, Pi, and pj. In order to tackle this scenarios, Section 
[3] presents a simple model that falls somewhere in the middle of the spectrum of possibilities presented above, and 
it is motivated by the study arc strength in online social networks. 

2.4 Edge Strength in Undirected Interaction Networks 

In the case of undirected networks, if the interaction counts {X^, i < j} are generated according to the model 

™ d Poisson(0/3 i /3 J ), for i < j, (3) 

then we say that there are no different values of edge strength governing the interaction in the network. We can fix 
Pi = 1, or require Y[j Pj = 1, to ensure the identifiability of this model. Notice that in undirected networks we do not 
obtain parameters associated with gregariousness or attractiveness. The model in equation ([3| is the Poisson analog 



of the so called beta model for undirected binary networks (see Rinaldo et al. 2011 Chatterjee et al. 2011 ). Similarly 
as for directed networks, our ideal parameter of edge strength is a multiplicative departure of the expected amount 
of interaction from the mean in equation ([3]), i.e., Ay = m%j /OpiPj. The inclusion of the Xij parameters jointly with 
the Pi parameters would require an additional set of constraints. For instance, if we fix Pi = 1, it would suffice to 
set Xu = 1, for alH > 1. The resulting model would involve 1 + (n — 1) + [n(n — l)/2 — (n — 1)] = n(n — l)/2 + 1 free 
parameters, whereas the number of interaction counts is only n(n — l)/2. This inconvenience leads us to find more 
parsimonious alternatives, as explained in the previous section. For the remainder of this article we focus, however, 
on an alternative model to study arc strength in directed interaction networks. 



3 Latent Arc Strength Stochastic Blockmodels 

We use our ideal approach to arc strength as a guideline, and propose a pair-dependent stochastic blockmodel 



(Holland et al. 1983 1 to study the distribution of arc strength in a network using count interaction data. This model 
assumes that the nodes are divided into homogeneous groups, or blocks, in the sense that the nodes are equally 
gregarious and attractive within block, and the distribution of arc strength depends only on the nodes' memberships 
to the different groups. In other words, this model assumes that the nodes' block-memberships determine the 
distribution of the dyads' interaction. This approach aims to model parsimoniously interaction networks not only 
via a block-structure, but also by treating arc strength as a latent variable. We call this class of models latent arc 
strength stochastic blockmodels (LASSB). Notice that we would still need a mechanism for specifying or finding the 
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blocks mentioned above. The parametrization of the LASSB presented below could be incorporated into the general 



methodology for block discovery presented in Mariadassou et al. (2010). In this article, however, we assume that 



the groups can be built up from nodal covariates, where, for instance, the blocks could be specified a priori by the 
researchers according to their exploratory interests. 

We propose to model the distribution of arc strength for each arc-block. In order to choose a sensible parametriza- 
tion, we take into account what we have learnt from observing interaction in online social networks. In Figure [I] 
we explored the distribution of chat messages for a subset of dyads with declared friendship links in the myGamma 
mobile social network. We saw that there are only a few user-arcs with large amounts of interaction, whereas most 
of the interaction is null or pretty small. We believe this is evidence that only a few arcs are strong, whereas most 
of them are weak. Hence, we propose to model Ay using some distribution defined on the non-negative reals, with 
a monotonically decreasing probability density function, which indicates that the proportion of arcs decreases as 
their strength increases. Furthermore, we observed the interaction from user i to user j to be highly correlated with 
the interaction from j to i. We propose to capture this feature by allowing correlation of the pair of arc strengths 
associated to a dyad, and so we model Ay and Xji jointly. 

3.1 Model Description 

Let the n nodes of the network be partitioned into S blocks (or node-blocks) denoted B s , s = 1, . . . ,S. We say 
the arc (i, j) belongs to the arc-block B rs if i E B r and j E B s . Similarly, we say the dyad {i,j} belongs to the 
dyad-block B rAs if i E B r and j E B s , or if j E B r and i E B s . Notice that if (i,j) E B rs , then {i, j} E B rAs , and 
(j, i) E B sr . In particular, notice that if E B rr , then (j, i) E B rr . 



The approach presented in this article models {(Ay, A,-,*), i < j} indirectly, adapting the ideas of Nelson (1985). 
Let the dyad strength be Ai A j = Ay + Xji, and let the arc share be pij = A^/A^j- From this formulation, the 
closer to 0.5, the larger the reciprocity in the relationship between nodes i and j. Note that we can obtain 
back Xij — PijXi A j, and Xji = (1 — Pij)Xi A j. By using this transformation in our block-modeling approach, the 
distribution of Xi A j depends only on the dyad {i, j}'s membership to the different dyad-blocks, and the distribution 
of p^ depends only on the membership of the arc (i, j) to the different arc-blocks. Notice that the distribution of 
Pij trivially determines the distribution of pji = 1 — py. Consequently, for modeling {(Xij, Xji), i < j} we need to 
specify the distribution of Aj/y for each of the S(S + l)/2 dyad-blocks, and the distribution of py for each of the 
S(S + l)/2 arc-blocks B rs with r < s. 

Modeling Xi A j as a gamma random variable allows the marginal distribution of Ay to have the desired charac- 
teristics that we mentioned before, since the gamma density function is monotonically decreasing when its shape 
parameter v rAs is lower than or equal to one. In this article we thus propose a parametrization of the LASSB for 
the interaction data X := {{Xij, Xji), i < j} using a hierarchical structure, as follows: 

X iA j\{i,j) E B rs ~ Gamma(/i rAs ,i/ rAs ), (4) 

Pij\(i,j) E B rs l ~ Beta(n rs ,(j) rAs ), 

Xij\Xij, (i,j) E B rs ™ rf Poisson(#a r /3 s Ay), 

Xji\Xji, (i, j) G B rs ™ rf Poisson^as/^A^), 

for all i < j, and r < s. Without loss of generality, we assume that the ordering of the nodes is such that if i E B r , 
j E B s , and r < s, then i < j. The gamma and beta parts of the model are parameterized in terms of their means 
p r /\ s , and TT rs , respectively (their densities are presented at the beginning of Appendix [A] for clarification) . Note 
that if pij ~ Beta(7r rs , 4> rAs ), then pji ~ Beta(7r sr , 4> r /\s), with 7r sr = 1 — ir rs . For dyad-blocks B ss we fix 7r ss = 0.5, 
since the ordering of the nodes within the same block is arbitrary. 
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It is easy to see that if the parameters a r , /3 S , /i rAs ; r, s = l,...,^} are not jointly constrained, then the 
model in equation Q is not identifiable. Let us then introduce an equivalent representation of the model. Let 
— JiAjHrAs, for {i,j} € B rAs , namely ji A j measures how dyad strength departs from its mean value, and so 
we call it relative dyad strength. Defining 7^ = Pijji A j and 7^ = (1 — Pij)"fiAj, the following parametrization is 
equivalent to the one presented in equation Q: 

liAjKhj) € B rs ~ Gamma(l,j/ rAs ), (5) 

pij\{i,j) 6 B rs ~ Beta(7r,. s ,0 rAs ), 

Xij\jij, € B rs ~ Poisson((9c>! r /3 s 

X 3i\ljii (m) G B rs ™ d Poisson(6'a ;j /3,.^ rAs 7 r i), 



for all i < j, r < s. The expression 9a s f3 r p rAs represents a quasi-symmetry model (see Bishop et al. 1975 Agresti 



2002) at the block level, since p r /\s — Ps/\r given that B rAs = B sAr . Hence, we set the constraints a.\ = = 1, 
and ^iiAs = 1 for s = 1, . . . , S, to avoid non-identifiability of the model. 

In model ([5| there is one parameter is rAs , and one (j) rAs for each dyad-block, which lead to 2 x S(S + l)/2 
parameters. Although there are also S(S + l)/2 parameters p rAs , S of them are constrained to be one. Model ^ 
also has one parameter ir rs per arc-block, but S of them are fixed to be 0.5, and the constraints ir rs + n sr = 1, for 
all r < s, have to hold, which lead to S(S — l)/2 free Tr rs parameters. Finally, model ^ includes one parameter 
9, and 2 x (S — 1) free node-block parameters a r and (3 S . We thus conclude that the LASSB as parameterized in 
equation ^ involves 2S 2 + 25—1 free parameters. In Appendix [X] we present an EM algorithm for the estimation 
of the set of parameters $ = {9, a r , (3 S , p rAs , v rAs ,ir rs , rAs ; r, s = 1, . . . , S} via maximum likelihood. 



3.2 Model Interpretation 

The latent arc strength stochastic blockmodel (LASSB), as presented in equation Q, indicates a generative process 
for the observed dyads' interaction, which only depends on the membership of the arcs to the different arc-blocks. 
As presented in equation Q, we assume that a gamma distribution generates the strength of the dyad {i,j}, which 
is shared between the two corresponding arcs according to a beta distribution. Given the strength of the arc (i, j), 
Kj = pij^iAj) th e interaction from i to j, Xy, is distributed according to a Poisson distribution, with a mean 
parameter that depends on the arc strength, and also on the gregariousness of node i, and the attractiveness of node 
j, which are assumed to be constant for all nodes within the same block. An interesting feature of the LASSB is 
that, thanks to its block structure, it does not necessarily require the complete set of n(n — l)/2 dyads for it to be 
fitted (see Nowicki and Snijders 2001 ). In particular, it can be fitted to a subset of dyads that are linked at a basic 
level, such as connected pairs of users in online social networks. 



3.2.1 Distribution of Dyad Strength 

According to the parametrization of the LASSB, we cannot estimate directly the mean dyad strengths {/i rAs , T < s} 
since they have to be constrained jointly with the gregariousness and attractiveness parameters. However, the shape 
parameters \v Ths ,r < s} capture important information about the distribution of dyad strength. The parameter 
v r As controls the shape of the probability density function (PDF) of dyad strength, and also its variance, since 
Var(XiAj) — Pr As /v r /\s for {i, j} G B rAs . As v rAs goes to infinity, the PDF of dyad strength gets concentrated 
around its mean, and its variance goes to zero, indicating that all dyads tend to have the same strength. On the 
other hand, as v rAs goes to zero, the PDF of dyad strength becomes more skewed to the right, and the variance goes 
to infinity, indicating that the dyads become more heterogeneous in terms of their strengths. For instance, when 
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studying online social networks we would expect v rAs to be small for all dyad-blocks, as we expect to have lots of 
weak dyads and only a few strong ones. 

3.2.2 Distribution of Arc Share 

In the LASSB we assign an arbitrary ordering to the blocks, which allows to estimate the mean arc share 7iy s of 
the arc-block B rs , for r < s (r ^ s), since in those cases we can create ordered pairs from nodes i € B r , and 
j € B s . The arc share indicates how symmetric the relationship between a pair of nodes is, and then -K rs allows 
to find how symmetric on average the relationships in each dyad-block are. For dyads where both nodes belong to 
the same block, it is not possible to assign a meaningful order to the pair, and consequently we fix tt ss = 0.5. In 
all cases, however, the shape parameter of the beta distribution <j> r /\ s controls the concentration of the arc share's 
PDF around its mean. Furthermore, rAs controls the variance of arc share, since Var(pij) — 7r rs 7r sr /(0 rAs + 1), 
for € B rs . Hence, large values of rAs indicate that most arcs in B rs tend to have the same share in their 

corresponding dyad strengths, or in other words, most relationships of dyads in B rAs tend to be equally symmetric 
or equally asymmetric, depending on the value of n rs . On the other hand, low values of rAs indicate a large spread 
of arc share, or equivalently, large heterogeneity in terms of how symmetric or asymmetric the dyads' relationships 
are. 

3.2.3 Association Measures 

Even though the LASSB controls gregariousness and attractiveness per block, the parameters {6, a r , (3 S , ^i rAs ; r, s = 
1, . . . , S} are not interpretable directly, given that the set of constraints on them is arbitrary. Nevertheless, these pa- 
rameters determine some measures that are informative of network structure. For instance, let m rs :— 9a r f3 s iJ, rAs ir rs 
be the marginal expected amount of interaction for arcs in B rs . The ratio m rs /m sr compares the frequency of in- 
teraction from block B r to block B s with respect to the interaction from B s to B ri and it can be written as 

m sr a s f3 r (1 - ir rs ) ' 

where it is clear that this interaction ratio is determined by how gregarious nodes in B r are with respect to nodes in 
B s , how attractive nodes in B s are with respect to nodes in B ri and how asymmetric on average the relationships 
of dyads in B rAs are, which is represented by the odds n rs /(l — ir rs ). Notice however, that n rs /(l — ir rs ) does not 
depend on different sets of constraints. Consequently, the ratio 

7T rs /(l - 7iy s ) a s p r 

is invariant to different constraint sets for the gregariousness and attractiveness parameters, and it can be interpreted 
as discounting the average asymmetry of the relationships from the interaction ratio, which leads to a measure 
determined only by the gregariousness and attractiveness of the blocks. Another interesting measure is the block- 
odds ratio 

m rs /Tn rs ' ^rs/f^rs' 

which depends only on the mean arc strengths per arc-blocks fi rs :— ^ rAs 7r rs . The above block-odds ratio takes 
block B r 's odds of interacting to B s vs. B s r, and compares them to block B r r's odds of interacting to B s vs. 
B s i . Since this measure docs not depend on gregariousness nor attractiveness of the blocks, it allows to discover 
comparative associations between blocks due only to the strength of the relationships between nodes. 
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3.2.4 Recovered Arc Shares and Relative Arc Strengths 

The hierarchical structure of the LASSB also allows us to explore X*j := 7ij/7r rs = Xij/p, rs for G B rsi which 
measures how arc strength departs from its expected value, and so we call it relative arc strength. The variance of 
A*„- can be written as 



Var(X* 



1T rs 4>r 



1T rs (4>r 



TT rs (4> r 



-, if (i,j)eB rs . (6) 



We saw that in scenarios where most relationships are symmetric, or closely symmetric, </vas is large, and n rs is 
close to 0.5. In those cases the factor within brackets in equation Q would be basically equal to one, and the second 
summand in ^ would be close to zero, which means that Var(X*j) would be controlled by the variability of relative 
dyad strength, which is l/is rAs . The measure Ay is interesting since it allows to adjust for the homophily/heterophily 
captured by the blocks. For instance, suppose two arcs are equally strong, i.e., Ay = Xu, and that we only have 
two blocks in the network. Say i,j G B\, I G B2, and /in 3> p\2- This scenario corresponds to one where 
homophily explains part of the network structure, since nodes in block By, on average, have stronger arcs with 
nodes in the same block than with nodes in block B 2 . Consequently, under this scenario the arc (i, I) is relatively 
stronger than Thus, relative arc strength accounts for these scenarios, and it can be recovered as A* 3 = 



E^ (pij'JiAj \X = x)/n rs , for G B rs , where E^(pijji A j\X — x) can be obtained as in equation (12| in Appendix 
|aJ using the maximum likelihood estimates (MLEs) $. Naturally, we can also explore the recovered arc share 
pij = E^{pij\X = x) = (xij + </vas^Vs) / +Xji + <Pr/\s)i for G B rs , and the recovered relative dyad strength 
7i A j = E^(ji/\j\X — x), which can be computed using <f>, and equation (111 in Appendix [X] Notice that in the 
above notation x := {(xij,xji),i < j} represents the observed interaction counts. 



4 Friendships in the myGamma Kolkata Network 

In this section we present the fit of the LASSB to the myGamma data described in Section |1.2| We take chat 
messages exchanged during June 2010 between dyads that were connected by at least one directional friendship, 
and we focus on users in Kolkata, India, taking gender as our a priori blocking criterion. Our data consists of 786 
mixed-gender dyads (bp A M = 786), 33 dyads of females (&faf = 33), and 156 dyads of males (&mam = 156), which 
jointly involve 188 different users. Among this set of users, 32 are involved in isolated dyads (i.e. there are 16 
isolated dyads), and so we would not be able to estimate individual gregariousness and attractiveness parameters 
for each of them. 

We fit the LASSB using the EM algorithm presented in Appendix |Aj and in order to obtain confidence limits 
for the parameters of the model, and for other interesting functions of the parameters, we follow a parametric 
bootstrap approach. We generate 300 interaction networks from the maximum likelihood fitted LASSB, and then 
we fit the LASSB to each of these bootstrap networks. Each bootstrap network is obtained by sampling interaction 
from the fitted model for each of the friendship dyads that we study. Since all the LASSB's parameters are defined 
on the non-negative reals, we transform them to the natural logarithm scale, then we compute basic bootstrap 



confidence limits (see Davison and Hinkley 1997) for the log-parameters, and finally we exponentiate the limits of 
these intervals. The same procedure was used to find confidence limits for the functions of the parameters that we 
study, since they are also defined on the non-negative reals. All these simulations and computations were performed 



using the software R (R Development Core Team 2012). We present the results in a series of four tables containing 
MLEs, and parametric bootstrap confidence limits with a 95% nominal confidence. Tables [l] and [2] contain the 
estimates of the LASSB's parameters, Table [3] contains the estimates for some functions of the parameters that are 
indicative of network structure, and Table [3] contains the estimates of mean interaction per arc-block. The point 
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Table 1: MLEs of 9 and node-block parameters in the LASSB for the Kolkata interaction 
data. Numbers preceded by an asterisk are fixed in the model. Parametric bootstrap 95% 
confidence limits appear in parenthesis under the corresponding point estimates. 



§ 


Block s 


& s 


k 


1.42 


Female 


*1 


*i 


(0.57 - 7.91) 










Male 


3.77 


3.80 




(0.68 


- 9.39) 


(0.69 - 9.55) 



Table 2: MLEs of arc-block and dyad-block parameters in the LASSB for the Kolkata inter- 
action data. Numbers preceded by an asterisk are fixed in the model. Parametric bootstrap 
95% confidence limits appear in parenthesis under the corresponding point estimates. 



Arc-Block rs 


l^rAs 




VrAs 


TT rs 


4>rf\s 


brAs 


Female - Male 


*1 




0.10 


0.27 


12.62 


786 






(0.09 


- 0.12) 


(0.24 - 0.29) 


(8.87 - 17.84) 




Female - Female 


*1 




0.07 


*0.50 


56,940.19 


33 






(0.02 


- 0.15) 




(30,371.28 - 1,417,551,821) 




Male - Male 


0.28 




0.05 


*0.50 


71.32 


156 




(0.08 - 2.13) 


(0.03 - 


- 0.07) 




(0.08 - 239.79) 





estimates for the functions of the parameters are obtained using the invariance property of MLEs. In Tables [T] and 
[2] the values with asterisks are pre-fixed, as explained in Section [3] We also present in Table [2] the number of dyads 
in the dyad-blocks B rAs , which are denoted b rAs . 

The estimates v r /\ s in Table [2] indicate that dyads involving both genders have less variability in terms of their 
relative dyad strength, compared to dyads involving the same gender. For instance, the estimated variability of 
relative dyad strength for dyads involving only males is estimated as 1/ vm am ~ 20, which doubles the corresponding 
variability for dyads involving both genders. The fact that the upper confidence limit for vmam/^fam in Table [3] is 
bounded below one allows us to say with 95% confidence that the variability of relative dyad strength for male-male 
dyads is larger than the corresponding variability for dyads involving both genders. The same conclusion, however, 
can not be stated when comparing the variability of relative dyad strength for female-female dyads with the other 
two groups of dyads, since the confidence limits for vfaf/vfaMi and for vfaf /vmam include one. The lecture of 
the estimates {z> rAs ;r, s € {F,M}} also tells us that the proportion of relatively weak dyads involving only males 
tends to be larger, compared to mixed-gender dyads, since the proportion of 7i A j being close to zero is larger for, 
say, vmam = 0.05, than for Pfam — 0.10. 

We can also see that the relationships between males and females are asymmetric on average, since the estimated 
mean arc share for female-to-male arcs, ttfm, is 0.27, which indicates that out of the dyad strength for mixed- 
gender dyads, only 27% on average corresponds to the strength of the female-to-male arc. The 95% confidence 
limits associated to ttfm are 0.24 - 0.29 (Table [2]), with which we can reject the hypothesis of average symmetry in 
the relationships between males and females (ttfm = 0.5). We can also see that the relationships between females 
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Table 3: Some measures of network structure for the Kolkata interaction data. We present 
their MLEs along with parametric bootstrap 95% confidence limits (CL). 



Measure 


MLE 


CL 




0.45 


0.28 - 0.69 


"faf/vfam 


0.66 


0.20 - 1.54 


vfafIvmam 


1.45 


0.45 - 3.71 


4>FAF 1 '<f>MAM 


798.38 


278.49 - 1,050,115,823 


m MF /ra MM 
m FF /m FM 


2.82 


0.37 - 9.68 


m MF / m FM 


2.69 


2.38 - 3.07 


m MF /m FM 


0.99 


0.98 - 0.99 



Table 4: Estimated mean interaction per arc-blocks m rs := 9a r f3 s ^, r ^ s TT rs . Parametric boot- 
strap 95% confidence limits appear in parenthesis under the corresponding MLEs. 



To 



From 


Female 




Male 


Female 


0.71 




1.46 




(0.28 - 3.96) 


(1.16 


- 1.91) 


Male 


3.93 




2.85 




(3.07 - 4.97) 


(1.60 


- 6.23) 



tend to be highly symmetric, since 4>faf = 56,940.19 indicates that the arc shares are highly concentrated around 
0.5. Notice that although the confidence interval for 4>f/\F is very wide, the values that it contains support the 
same claim. Compared to relationships between females, relationships between males seem to have a wider range 
of variability in terms of their asymmetries, since 4>mam — 71.32 indicates that the arc shares are more spread out 
from 0.5. This conclusion can also be stated from the confidence interval for 4>faf/4>mam in Table [3j which is 
located far above one. These variabilities are also observed in the recovered pij, which are presented in Figure [2] 

In equation ^ we saw how the variance of relative arc strength behaves as a function of v r/ \ s , 4> r /\ S: and 7T rs . 
According to the obtained estimates, we can see that for all blocks, the variability of relative dyad strength dominates 
Var(\*j), since the parameters v r ^ s are close to zero, and the ^ rAs are large in general. This variabilities are also 
reflected in the histograms of Ajy , which are presented in Figure [3] 

From Table [3j males' odds of sending chat messages to a female vs. a male are 2.82 times the females' odds 
of sending chat messages to a female vs. a male. Since this measure only depends on the mean arc strength per 
arc-blocks, it could be interpreted as evidence of heterophily in the network, since it would mean (j>mf/ fi>MM = 
2.82^,ff//J'FM- However, its associated confidence interval contains the point one, which does not allow us to reject 
the hypothesis Hmf / Hmm = Hff I V-fm- We also obtain the interaction ratio vumf I^fm = 2.69, with an associated 
confidence interval that allows us to say that the expected interaction from male to female is significatively larger 
than the expected interaction from female to male. When discounting the asymmetry of the relationships between 



12 



Female to Female 



Female to Male 



0.2 0.4 0.6 0.! 

Recovered Arc Share 



0.2 



0.4 0.6 

Recovered Arc Share 



0.8 



Male to Female 



Male to Male 




0.4 0.6 

Recovered Arc Share 



0.4 0.6 

Recovered Arc Share 



Figure 2: Recovered arc share for the four arc-blocks. 
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Figure 3: Recovered relative arc strength (on the squared root scale) for the four arc-blocks. 



females and males from this interaction ratio, we obtain a measure of 0.99 with a really sharp associated confidence 
interval. Under the LASSB, this measure can be interpreted as evidence that the asymmetries in the relationships 
among the mixed-gender dyads nearly explain the imbalance of the interaction ratio, whereas the gregariousness 
and the attractiveness of the blocks jointly nearly compensate each other. 

In the left hand side of Figure [4] we explore how the recovered relative arc strength changes compared to the 
actual amount of interaction, measured as number of chat messages. We can see that as the number of chat messages 
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10 20 30 40 50 2 4 6 8 10 



Chat Messages Female to Male Sq. Root Recov. Relative Arc Strength 

Figure 4: Left: number of chat messages per arc vs. recovered relative arc strength. Right: female 
to male vs. male to female recovered relative arc strengths, on the squared root scale. The diagonal 
line represents equality between the two relative arc strengths coming from a dyad. 



increases, the relative arc strength increases at different rates for the four arc-blocks. For instance, an interaction of 
10 messages from female to female is relatively more important than 10 messages from a male to a female. Finally, 
in the right hand side of Figure [4] we compare the two recovered relative arc strengths for mixed-gender dyads. 
We can see that large values of female-to-male recovered relative arc strength are paired with lower values for 
the male-to-female counterpart, which contrasts with the observations that we made on chat messages exchanged 
between mixed-gender dyads (Figure [I]). Although males tend to send more chat messages to females than females 
to males, whenever a dyad has relatively strong arcs, the female-to-male arc tends to be relatively more important 
than the male-to-female one. 



4.1 Checking Goodness of Fit 



In order to check the goodness of fit of our model we follow the ideas of Hunter et al. ( 2008 ) by comparing structural 



statistics of the observed network with the corresponding statistics on networks simulated from the fitted model. 
We firstly check how our model fits to a dichotomized version of the network by studying the nodal distribution of 
binary outdegree: J2j H-^-ij > 0)i an d binary indegree: JT I(Xji > 0), where /(•) represents the indicator function. 
We choose to study these statistics since in different applications, interaction counts are dichotomized assigning a 
link from node to node whenever there is some amount of interaction, and consequently it is reasonable to ask if 



our model predicts well these binary characteristics, despite it being designed for interaction networks (see Thomas 



and Blitzstein (2011) for a discussion on consequences of dichotomizing valued networks). We also study how the 



fitted LASSB predicts the distribution of valued outdegree: Y)j Xg, and valued indegree: Y]j Xjj, since these two 
measures represent the total amount of interaction going from, and to certain user, respectively. We also consider the 
distribution among dyads of absolute interaction difference: |Xy —Xji\, since this measure reflects the dependencies 
of the arcs in a dyad. Finally, since our model assumes dyadic independence, it is important to check whether this 
assumption is reasonable for the Kolkata friendships network. A good way to check this assumption is by studying 
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triads' characteristics, since this allows to detect transitivity effects in the network. Let the dyad interaction be 
XiAj = Xij + Xji. The number of triangles at level c is defined as J2i<j<k I(Xi/\j > c)I(Xi A k > c)I{Xj h k > c), 
where c represents a cutoff value. We explore this measure for different cutoff values c. 

We generated 1,000 interaction networks from the fitted LASSB model, and we present the summarized results 
in Figure [5] following the format of Hunter et al. (20081. For each simulated network, and for the first five statistics 
mentioned above, we computed the proportion of nodes or dyads with their corresponding statistic being equal to a 
specific value or range of values, as specified in the horizontal axis of panels (a) to (e) in Figure[5] We also computed 
the proportion of triangles among friendship triads varying the cutoff c from zero to six. For the statistics on nodes, 
and on dyads, the distributions of their corresponding simulated proportions are explored using one box-plot on 
the log-odds scale for each value or range of values of the different statistics. In the case of the proportion of 
triangles at different levels, one box-plot is presented for each cutoff c. In Figure [5] the bold black lines represent 
the proportions observed in the Kolkata friendships network, and the gray lines represent intervals containing 95% 
of the simulated proportions. We say that we obtain a good fit of the model to certain network characteristic if the 
observed proportions fall within the range of variation obtained from the simulation. 

We can see from panels (a) and (b) in Figure [5] that the LASSB fits well to the distribution of binary outdegree 
and binary indegree of the network, although the observed proportion of nodes having a binary indegrec equal to 
four is low compared to the values obtained in the simulations. Panel (c) of Figure [5] shows how the model fits to 
the distribution of valued outdegree. We can see that the model tends to produce networks where the distribution 
of valued outdegree has a right tail heavier than in the observed network, although the observed values fall within 
the variation range predicted from the model. Panel (d) in Figure [5] indicates that the fit of the model to the 
distribution of valued indegree is pretty good, although the proportion of nodes with observed zero valued indegree 
is a little high compared to the normal range of variation obtained from the model. From panel (e) in Figure [5] 
we can observe that the model fits properly to the distribution of absolute interaction difference among friendship 
dyads, although the simulated proportions of dyads with an absolute interaction difference of nine or more tend to 
be larger than what we observe in the Kolkata network. Finally, in panel (f) of Figure [5] we can see how the LASSB 
fits to the proportion of triangles at different cutoff levels. We can see that the proportions of triangles in the 
Kolkata friendships network are in general larger than what is typically expected from the model. This result was 
expected since our model does not capture transitivity effects. Nevertheless, the fact that the observed proportions 
of triangles fall within the simulated variation range indicates that the transitivity effects in the network are not 
too large. 



5 Discussion 

Our approach to arc/edge strength requires the construction of a null model that can be regarded as a scenario 
where there are no differential values of arc/edge strength. We argued that this null model should include a notion 
of independence for the expected amounts of interaction, which in the case of directed interaction networks indicates 
that only the nodes' gregariousness and attractiveness explain the expected interaction counts. The null model for 
the interaction network requires the complete specification of a distributional form. In this article we explained 
why an independent Poisson process for each interaction count could be a sensible choice, although we expect other 
choices to be reasonable as well. Arc/edge strength was therefore casted as a multiplicative departure from the 
expected amount of interaction under our null model. 

We showed that taking each arc/edge strength as a fixed effect leads to models containing too many parameters, 
which is not useful for statistical analysis. Using our ideal approach to arc/edge strength as a guideline, we mentioned 
a range of alternatives that aim to build parsimonious models by treating parameters as functions of covariates or 
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Figure 5: Summary of 1,000 simulations for checking the goodness of fit of the LASSB to the 
Kolkata interaction data. In all plots the frequencies are presented in the log-odds scale, the 
black lines represent the observed frequencies in the Kolkata data, and the gray lines represent 
intervals containing 95% of the simulated frequencies. The boxplots represent the distribution of 
the frequencies obtained from the simulated networks. 
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as latent variables. In particular, we developed a latent arc strength stochastic blockmodel (LASSB) for directed 
interaction networks, which takes arc strength as a latent variable, and jointly models the two arc strengths coming 
from a dyad, capturing dependencies in the interaction counts. The LASSB further assumes the existence of blocks 
of nodes which are homogeneous with respect to their gregariousness and attractiveness, leading to a parsimonious 
way to explore the structure of the network. Given that the LASSB does not involve node-specific fixed effects, 
it can be used to explore arc strength for a subset of dyads that may be the focus of interest, such as dyads with 
declared friendships in online social networks. 

We saw how the proposed ideas allow to quantify asymmetries of the relationships between genders in the 
myGamma Kolkata network. This approach also helped us to explore distributional characteristics of arc strength 
in the network, which is indicative of the strength of the online relationships between users of myGamma. Although 
we saw that our model fitted properly to some characteristics of the observed network, it is clear that models 
relaxing the dyad independence assumption would capture more information about the structure of the network, 
such as the presence of transitivity in the interaction between users. This article can be considered as a first step in 
the construction of appropriate methodologies that incorporate the notion of arc/edge strength in the modeling of 
interaction networks. 
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A Appendix: EM Algorithm for the LASSB 



We present an EM algorithm (Dempster et al. 19771 for fitting the LASSB via maximum likelihood 



A.l Complete Likelihood 

In the formulation of the LASSB in equation Q, the gamma, and beta parts of the model are parameterized in 
terms of their means fJ, rAs , and 7r rs , such that their density functions are given by 



hr{K/\j) 



and 



flB(Pij) 



B(^ rAs 7T rs , 0rAs(l — TTrs)) 



^i/\T 1 eX P( _ v rf\s^if\j / ^rf\s), 

«*«- l (l_ Wi )Wl-*r.)-l. 



The density for X iA j is presented just for clarification, since we actually use the LASSB as in equation ([5]), where 
the gamma part has its mean fixed as one. Now, let us define — 9a r f3 s ^, r ^ s if € B rs in equation The 
complete likelihood of the LASSB is thus given by 



n n 



r(^VAs)B ((f> r /\ s TT rs , rAs (l — TT rs j) XijlXjil 



x exp 
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where x iA j = x^ + Xji . Note that the kernel of the conditional density of ji A j , pij \X = x can be obtained specifying 
the following hierarchical structure: 



Pij\X = x, e B rs ~ Beta — ,x iAj + <p rAs , (7) 

\ x iAj + <Pr/\s J 

liAj\X = x,p tJ , e B rs ~ Gamma ( — X%M r- " rAs ,x iAj + v rAs ) . (8) 

\Pij\ T ij ~~ T ji) i T ji i v rAs J 

This property is useful for deriving the expectation step of the EM algorithm. 

In order to estimate the vector of parameters $ involved in the model, we only need to take into account the 
part of the complete log-likelihood that involves $, i.e., 

/($;p,7,x) = ^< b r As\v r A S \0gV rAs - \ogT{v rAs ) - logB(0 rAs 7T rs ,0 rAs (l - 7T rs )) (9) 
r<s k 

+ 4>rAs ^2 (nrslog pij + (1 - TT rs ) log(l - Pij)\ - V rAs ^ (liAj ~ l«g 7*Aj) 
i& ) 

where b rAs denotes the number of dyads in B rAs . 
A. 2 Expectation Step 

Using a vector of estimates $^- ) from iteration t, the EM algorithm requires for the E step the computation of 

E&t) [/($; p, 7, X)\X = x] = E *w Pij,liA3, x i3, x ji)\ x i3 = Xij,Xji = Xji] . (10) 

However, by linearity of the expectation, we just need to compute the five expectations presented below. Using ([7]) 
we hnd 

4 t+1) := E^ t) {\og PlJ \X =x) = tl>( Xij + ~ ^(*«AJ + 4%), 

:= B 4W (log(l - Pij )\X = x)= ^(xji + ^%(1 - 7rg>)) - ^(x iAj + 

where tp(-) represents the digamma function, and 4n A j — <j) rA s, ^ij = ^rs if {hi) G B rs . Using Q and the law 
of total expectation, and integral representations of the hypergeometric and generalized hypergeometric functions, 
we obtain 



liAj^ '■= E ^){liAj\X = x) 



(XiA3+ViAj) E *W\ „, _ _„, _,/, ,/, 

Vij T ji J "T T ]i ~T U iAj 



1 



X = x 



(0 7 (t) 2Fl _ ,At) ' (t) - W ' 
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and 



.,(*+!) 



:= E^(t)(pijj iA j\X = x) 





u lAj) 








(*) 


1 - 


* ( 11 


r- (t) - 


r (t) 











Pij 



+ r (t) 4 



(*) 

J. - 

3 l 



X 



(*) , (*) 



(12) 



where 2^i(") represents the hypergeometric function (see Olde Daalhuis 2010), and ViAj — v r /\s if (hj) € 5 r 
Similarly we obtain 



.(t+i) 



£ , $(t )(log7 lA: ,-|X = jc) 



1p\ x iAj 
1p(%iAj 
( X ij 



+ V. 



V, 



-log( 



(*) 

iAj 

(*) 
iAj 

iAj "ij 



log 



Pij I 
u xAj) 

.(*)\ 

(*) 



.(*) 



iAj) 



) + 



(*) 



(*) 



lAj 



X = x 



lAj IJ 



r (t) 



\A] . 



(13) 
(14) 



where zFi[^) represents the generalized hypergeometric function (see Askey and Olde Daalhuis 2010). Equation 
(0 holds only if \{ T f> + u (t) 



< 1, otherwise we compute (13) using a Monte Carlo approximation for 



Jl ■ ij II V ]l 1 "lAjl 

the unevaluated expectation, taking a large random sample from a beta distribution with parameters as in equation 
and TTrg. Implementations of the hypergeometric and generalized hypergeometric functions are 



@, but 

available in the R package hypergeo (Hankin 2012 1. 



using 



A. 3 Maximization Step 

For the M step we need to find 

= argmax{^ (t) [l($;p,<y,X)\X = x] }. 

From equation ^ we can see that the maximization over $ can be obtained independently over three subsets of 
parameters: {u rAs ;r, s = 1, .. . ,S}, {ir rs , ^ rAs ;r, s = 1,.. .,£?}, and {0, a r , /3 S , p r As] r, s = 1, . .. ,S}. 

In order to maximize with respect to {9, a r ,j3 s , /i rAs ; r, s = 1, . . . , S}, note that these parameters are only involved 
in the Poisson part of the complete likelihood, which allows to estimate them from a Poisson log-linear model of 
quasi-symmetry with offset, i.e., Xy l ~ d Poisson(mjj), where 

log mij =v + Vr+ vf + Vrf + log 7^j +1) , (15) 

if E B rs , with the quasi-symmetry constraint rfff = rf^? , and the usual Vi = Vi = 0, and rj^f = rjff = 
for all r, s. This formulation allows to take advantage of software built-in procedures to estimate generalized 
linear models via maximum likelihood (e.g., glm in R). We use the functional invariance property of MLEs to obtain 
{#(*+!) : Q ! f* +1 ' ) , /3s t+1 ^ , p,rAs^ > r i s = 1) ■ ■ ■ i 3} by exponentiating the corresponding coefficients of the log-linear model, 
e.g., 0( t+1 ) = exp(7?( t+1 >) and so on. We take r^ +1) := 9^ a [ r t+1) & t+1) p^ if (hj) € B rs . 
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In order to maximize over (ir r 



we need to maximize the function 



f {t+1 \TT rs ,(j) rAs ) = -b rAs l0g~B(4> rAs TT rs ,4> rAs (l - 7T rs )) + (f> rAs E (7rr S ^-* +1) + (1 - 7rr S )4i +1) ): 



(i,j)eB r 



and we take (iTrs +1 \ <j>rAs) ~ argmax f^ t+1 \7T rs ,(f) rAs ). However, when r — s, we fix Tr rs — 0.5. Finally, the 



objective function to maximize over v rAa reduces to 

9 (t+1 \v rAs ) = b rAs v rAs \ogv rAs - \ogT(v rAs ) 



and we find z^as — arg max g^ t+1 ^ (v rAs ) . All the previous maximizations can be obtained using an iterative method 
such as the Nelder-Mead algorithm. 

A. 4 Starting Values 

We need to provide the EM algorithm with starting values <J>(°). We propose to compute using some reasonable 
initial measures 7^-, and p\^\ Let x*j = Xij+e (e is, e.g., 0.05), x* f 



for r < s. We add a small s to Xij in order to avoid initial zero 7L , since our proposal for computing <I>( ' does not 



Aj X ij ' X rAs 

(0) 



= (i/W)E 



work otherwise. We take 7^ = x* A j/x* As , for <E B rs ,r < s, since this measure captures the total interaction 

in one for 

the arc in the dyad interaction. Finally, j^ 1 = 7i A jP\"' = x ij/x* rAs 

<ose to use a method of mc 

presented in equation ^ it is easy to check that Var(^fi A j) = l/u rAs , for G B rs , from which we take 



of the dyad, and it has mean one for each dyad-block. We also take pf^ — x*j/x* A j since this captures the share of 



In order to find iri^, and <l>r/\s we propose to use a method of moments approach. From the parametrization 



,(0) 



E 



For r < s, we take nrs = (V&rAs) Yl(i,j)eB rt P^ij (remember -k ss — 0.5 is fixed). It is also easy to check 
E[pij(l - pij)] = ir rs (l - ir rs )(f> rAs /(l + 4> rAs ), for £ B rs , and hence 

, _ E[pij{\ - pij)) 



from which we take 



AO) 

y fAs 



7T rs (l - 7T rs ) - E[pij(l - Pij)] 



(1/brAa) J2i<j,(i,j)£B rs Pij (1 _ % ) 



7T<?(1 ^ 7r£0 - (l/6 rA .) Ei<i,(iJ)6B P . (! ~ ) 



(0), 



Note that using this approach to find 



-(0) 



is appropriate, since it does not actually require an ordering among i 



and j. Finally, we take {0^, ai , /3s°\ /4-As ! s = 1, . . . , 5} from a Poisson log-linear model of quasi-symmetry as 
in Section 



A.3 



taking log 7^ as an offset. 
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